pacman::p_load(
sf, # For handling spatial data and geometries
tidyverse, # Collection of packages for data manipulation and visualization
tmap, # For creating thematic maps and spatial visualizations
knitr, # For dynamic report generation and R markdown documents
kableExtra, # For creating nice-looking tables in R markdown
janitor, # For cleaning and examining data
skimr, # For providing summary statistics about variables in data frames
stringdist, # For string similarity
ggstatsplot, # For EDA
spdep, # For spatial autocorrelation
GWmodel, # For for regression
corrplot # For for correlation analysis
)Take-home Exercise 03
This take-home exercise examines the geography of financial inclusion through geographically weighted regression (GWR) to identify and analyse factors influencing access to financial services. Using the FinScope 2023 dataset for Tanzania, this study will focus on district-level insights, offering a spatial perspective on financial inclusion determinants. The exercise will involve geospatial data wrangling, model diagnostics, and geovisualisation, adhering to the grading criteria for data handling, analytical rigour, effective visual communication, and reproducibility in a Quarto environment. The research is grounded in existing literature on financial inclusion, specifically insights from Tanzania, where financial inclusion plays a pivotal role in economic empowerment and reducing income inequality. The goal is to provide a clear, data-driven understanding of spatial accessibility to financial services, with the results aimed at informing policies for broader economic inclusivity.
The research paper on financial inclusion in Tanzania offers valuable insights into the determinants, barriers, and impacts of financial inclusion, highlighting the role of mobile banking and formal financial services in improving economic well-being. It identifies education and income as key determinants and recognises geographic constraints—such as distance to financial institutions—as significant barriers to access. This aligns closely with the objectives of the take-home exercise, which seeks to model the spatial aspects of financial inclusion at the district level. By employing geographically weighted regression (GWR), this exercise will expand upon the research findings by focusing specifically on spatial variability in financial access. The geographic emphasis in this study offers a nuanced understanding of how location influences financial inclusion, which could lead to targeted interventions to overcome geographic barriers and support underserved regions, thereby complementing the study’s broader socio-economic conclusions.
Understanding Tanzania: A Contextual Introduction
As a Singaporean student analysing financial inclusion in Tanzania, I find it crucial to first understand the country’s unique characteristics. While Singapore and Tanzania might seem vastly different, both share a British colonial history and gained independence in the 1960s. However, their development paths have diverged significantly. Here’s a comprehensive overview of Tanzania that will help frame my analysis:
Why This Context Matters
Before diving into financial inclusion statistics and analysis, understanding Tanzania’s geography, economy, and demographics is essential because:
1. Physical geography influences access to services
2. Economic activities affect financial needs
3. Population distribution impacts service delivery
4. Infrastructure development determines financial service reach
Understanding these features also helps me:
1. Identify potential barriers to financial inclusion
2. Understand regional variations in service access
3. Appreciate the role of mobile money in overcoming geographical challenges
4. Recognise why different regions might need different financial solutions
Coming from Singapore’s context of universal banking access and high technological adoption, this understanding helps me approach the analysis with appropriate context and avoid making assumptions based on my Singaporean experience. The vast differences in geography, population distribution, and economic activities between Tanzania and Singapore highlight why financial inclusion solutions that work in Singapore might not be directly applicable to Tanzania.
Key Comparisons with Singapore
comparison_df <- data.frame(
Aspect = c("Land Area", "Population", "GDP per capita",
"Urbanisation", "Main Economic Sectors", "Capital City"),
Tanzania = c("945,087 km²", "~61 million", "~$1,300 USD",
"35.2% urban", "Agriculture, Mining, Tourism",
"Dodoma (official), Dar es Salaam (de facto)"),
Singapore = c("728 km²", "~5.7 million", "~$75,000 USD",
"100% urban", "Finance, Technology, Trade", "Singapore")
)
kable(comparison_df,
caption = "Key Comparisons between Tanzania and Singapore") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE) %>%
column_spec(1, bold = TRUE) %>%
column_spec(2:3, width = "20em")| Aspect | Tanzania | Singapore |
|---|---|---|
| Land Area | 945,087 km² | 728 km² |
| Population | ~61 million | ~5.7 million |
| GDP per capita | ~$1,300 USD | ~$75,000 USD |
| Urbanisation | 35.2% urban | 100% urban |
| Main Economic Sectors | Agriculture, Mining, Tourism | Finance, Technology, Trade |
| Capital City | Dodoma (official), Dar es Salaam (de facto) | Singapore |
rm(comparison_df) # Keep environment cleanGeographical Diversity
Unlike Singapore’s uniform urban landscape, Tanzania presents complex geographical features:
1. Physical Features:
800km Indian Ocean coastline
Great Rift Valley running through central regions
Mount Kilimanjaro (Africa’s highest peak)
Major lakes (Victoria, Tanganyika)
2. Economic Zones:
Northern Circuit (Tourism)
Southern Highlands (Agriculture)
Lake Zone (Fishing, Mining)
Coastal Zone (Trade, Services)
Development Challenges
As a student from Singapore, I notice several contrasts in development challenges:
1. Infrastructure:
Transportation networks concentrated in certain regions
Rural-urban connectivity issues
Varying quality of telecommunications coverage
2. Economic:
Large rural population (64.8%)
Regional economic disparities
Heavy reliance on agriculture
Informal sector significance
3. Financial Services:
45 licensed banks (mostly in urban areas)
32.3 million mobile money accounts
65% financial inclusion rate
Over 100 microfinance institutions
Administrative Structure
Tanzania’s governance structure affects service delivery:
- 31 regions
- 184 districts
- Two capital cities:
* Dodoma (Official capital, centrally located)
* Dar es Salaam (Economic hub, coastal location)
In my subsequent analysis, I’ll refer back to these contextual factors to ensure my interpretation of financial inclusion patterns is grounded in Tanzania’s unique circumstances rather than Singapore’s standards.
Load the packages and examine the shapefile
# Read the shapefile
tz_boundaries <- st_read(dsn = "data/geospatial/",
layer = "geoBoundaries-TZA-ADM2")Reading layer `geoBoundaries-TZA-ADM2' from data source
`C:\zzzzzuu\ISSS626GAA\Take-home_Ex\Take-home_Ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 170 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 29.58953 ymin: -11.76235 xmax: 40.44473 ymax: -0.983143
Geodetic CRS: WGS 84
# Correct misspelled Butiama which was discovered below
tz_boundaries <- tz_boundaries %>%
mutate(shapeName = case_when(
shapeName == "Butiam" ~ "Butiama",
TRUE ~ shapeName # keeps all other names unchanged
))
# Check the CRS (Coordinate Reference System)
st_crs(tz_boundaries)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
# Take a quick look at the data
glimpse(tz_boundaries)Rows: 170
Columns: 6
$ shapeName <chr> "Arusha", "Arusha Urban", "Karatu", "Longido", "Meru", "Mon…
$ shapeISO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ shapeID <chr> "72390352B32479700182608", "72390352B90906351205470", "7239…
$ shapeGroup <chr> "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZ…
$ shapeType <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((36.86084 -3..., MULTIPOLYGON (…
tmap_mode("plot")tmap mode set to plotting
# Create a quick plot to verify the import
tm_shape(tz_boundaries) +
tm_borders()
sort(unique(tz_boundaries$shapeName)) [1] "Arusha" "Arusha Urban"
[3] "Babati" "Babati UrbanBabati Urban"
[5] "Bagamoyo" "Bahi"
[7] "Bariadi" "Biharamulo"
[9] "Buhigwe" "Bukoba"
[11] "Bukoba Urban" "Bukombe"
[13] "Bunda" "Busega"
[15] "Butiama" "Chake Chake"
[17] "Chamwino" "Chato"
[19] "Chemba" "Chunya"
[21] "Dodoma Urban" "Gairo"
[23] "Geita" "Hai"
[25] "Hanang" "Handeni"
[27] "Handeni Mji" "Igunga"
[29] "Ikungi" "Ilala"
[31] "Ileje" "Ilemela"
[33] "Iramba" "Iringa"
[35] "Iringa Urban" "Itilima"
[37] "Kahama" "Kahama Township Authority"
[39] "Kakonko" "Kalambo"
[41] "Kaliua" "Karagwe"
[43] "Karatu" "Kaskazini A"
[45] "Kaskazini B" "Kasulu"
[47] "Kasulu Township Authority" "Kati"
[49] "Kibaha" "Kibaha Urban"
[51] "Kibondo" "Kigoma"
[53] "Kigoma Urban" "Kilindi"
[55] "Kilolo" "Kilombero"
[57] "Kilosa" "Kilwa"
[59] "Kinondoni" "Kisarawe"
[61] "Kishapu" "Kiteto"
[63] "Kondoa" "Kongwa"
[65] "Korogwe" "Korogwe Township Authority"
[67] "Kusini" "Kwimba"
[69] "Kyela" "Kyerwa"
[71] "Lindi" "Lindi Urban"
[73] "Liwale" "Longido"
[75] "Ludewa" "Lushoto"
[77] "Mafia" "Mafinga Township Authority"
[79] "Magharibi" "Magu"
[81] "Makambako Township Authority" "Makete"
[83] "Manyoni" "Masasi"
[85] "Masasi Township Authority" "Maswa"
[87] "Mbarali" "Mbeya"
[89] "Mbeya Urban" "Mbinga"
[91] "Mbogwe" "Mbozi"
[93] "Mbulu" "Meatu"
[95] "Meru" "Micheweni"
[97] "Missenyi" "Misungwi"
[99] "Mjini" "Mkalama"
[101] "Mkinga" "Mkoani"
[103] "Mkuranga" "Mlele"
[105] "Momba" "Monduli"
[107] "Morogoro" "Morogoro Urban"
[109] "Moshi" "Moshi Urban"
[111] "Mpanda" "Mpanda Urban"
[113] "Mpwapwa" "Mtwara"
[115] "Mtwara Urban" "Mufindi"
[117] "Muheza" "Muleba"
[119] "Musoma" "Musoma Urban"
[121] "Mvomero" "Mwanga"
[123] "Nachingwea" "Namtumbo"
[125] "Nanyumbu" "Newala"
[127] "Ngara" "Ngorongoro"
[129] "Njombe" "Njombe Urban"
[131] "Nkasi" "Nyamagana"
[133] "Nyang'hwale" "Nyasa"
[135] "Nzega" "Pangani"
[137] "Rombo" "Rorya"
[139] "Ruangwa" "Rufiji"
[141] "Rungwe" "Same"
[143] "Sengerema" "Serengeti"
[145] "Shinyanga" "Shinyanga Urban"
[147] "Siha" "Sikonge"
[149] "Simanjiro" "Singida"
[151] "Singida Urban" "Songea"
[153] "Songea Urban" "Songwe"
[155] "Sumbawanga" "Sumbawanga Urban"
[157] "Tabora Urban" "Tandahimba"
[159] "Tanga Urban" "Tarime"
[161] "Temeke" "Tunduma"
[163] "Tunduru" "Ukerewe"
[165] "Ulanga" "Urambo"
[167] "Uvinza" "Uyui"
[169] "Wanging'ombe" "Wete"
The map appears accurate; however, there are urban splits which are not useful.
# First, create a function to clean urban district names
clean_urban_names <- function(name) {
# Convert to lowercase for easier matching
name <- tolower(name)
# Remove common urban suffixes and clean up
name <- gsub(" urban$| township authority$| mji$", "", name)
# Clean up specific cases
name <- gsub("babati urbanbabati", "babati", name)
name <- gsub("korogwe township authority", "korogwe", name)
name <- gsub("kigoma urban", "kigoma", name)
name <- gsub("masasi township authority", "masasi", name)
# Capitalize first letter of each word
name <- tools::toTitleCase(name)
name <- trimws(gsub("\\s+", " ", name))
return(name)
}
# Apply the transformation and merge polygons
# Assuming your spatial data frame is called 'spatialdata'
tz_boundaries_merged <- tz_boundaries %>%
# Clean the district names
mutate(shapeName = clean_urban_names(shapeName)) %>%
# Group by the cleaned district names
group_by(shapeName) %>%
# Merge the geometries
summarise(
geometry = st_union(geometry),
.groups = "drop"
) %>%
# Fix any invalid geometries after merging
st_make_valid() %>%
# Apply a zero buffer to fix any remaining issues
st_buffer(0) %>%
# Final validation check
st_make_valid()
rm(clean_urban_names)the presence of numerous islands may influence the positioning of the centroid. This tells me to give careful consideration to ensure precise placement.
# Clean and transform the spatial data
tz_districts <- tz_boundaries_merged %>%
# Keep only necessary columns
select(district_name = shapeName,
geometry) %>%
# Convert to more appropriate projection for Tanzania
st_transform(crs = 32737) %>% # UTM Zone 37S
# Arrange alphabetically by district name
arrange(district_name)
# Create a more detailed map to verify the data
tm_shape(tz_districts) +
tm_polygons(col = "whitesmoke",
border.col = "gray30",
border.alpha = 0.5) +
tm_layout(main.title = "Tanzania Districts",
main.title.size = 1,
frame = FALSE) +
tm_compass(position = c("right", "top")) +
tm_scale_bar(position = c("left", "bottom"))
#Remove smaller islands in the multipolygons to improve centroid placement
tz_districts_polygon <- tz_districts %>%
st_cast("POLYGON") %>%
mutate(area = st_area(.))Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant
tz_districts_polygon_main <- tz_districts_polygon %>%
group_by(district_name) %>%
filter(area ==max(area)) %>%
ungroup() %>%
dplyr::select(-area) %>%
dplyr::select(district_name)
# Calculate centroids
tz_centroids_main <- st_centroid(tz_districts_polygon_main)Warning: st_centroid assumes attributes are constant over geometries
tz_centroids <- st_centroid(tz_districts)Warning: st_centroid assumes attributes are constant over geometries
# Create the map for tz_districts_polygon_main with centroids overlay
tmap_mode("view")tmap mode set to interactive viewing
map1 <- tm_shape(tz_districts_polygon_main) +
tm_polygons() +
tm_shape(tz_centroids_main) +
tm_dots(size = 0.1, col = "red") +
tm_layout(title = "TZ Districts Polygon Main with Centroids")
# Create the map for tz_districts with centroids overlay
map2 <- tm_shape(tz_districts) +
tm_borders() +
tm_shape(tz_centroids) +
tm_dots(size = 0.1, col = "blue") +
tm_layout(title = "TZ Districts with Centroids")
# Arrange the maps side by side
tmap_arrange(map1, map2)rm(tz_boundaries,
tz_boundaries_merged,
tz_districts,
tz_centroids,
tz_districts_polygon
) # Keep environment cleanSome important notes:
- I’ve transformed the CRS to UTM Zone 37S (EPSG:32737) which is more appropriate for Tanzania as it:
- Preserves area measurements
- Provides more accurate distance calculations
- Is suitable for mapping at district level
- The centroids for Uvinza is off-centre. I will manually shift the centroid.
latitude_shift <- 40000 # This was manually tested and shifted
longitude_shift <- -35000
# Get coordinates of Uvinza's centroid
uvinza_coords <- st_coordinates(tz_centroids_main$geometry[tz_centroids_main$district_name == "Uvinza"])
# Create new point with shifted coordinates
new_uvinza_point <- st_point(c(
uvinza_coords[1] + longitude_shift,
uvinza_coords[2] + latitude_shift
))
# Create new geometry collection
new_geometries <- tz_centroids_main$geometry
new_geometries[tz_centroids_main$district_name == "Uvinza"] <- st_sfc(new_uvinza_point, crs = st_crs(tz_centroids_main))
# Create new centroids object with shifted Uvinza point
tz_centroids_shifted <- tz_centroids_main
tz_centroids_shifted$geometry <- new_geometries
# Verify the shift
tmap_mode("view") # Change from "view" to "plot" to save on computingtmap mode set to interactive viewing
tm_shape(tz_districts_polygon_main) +
tm_polygons() +
tm_shape(tz_centroids_shifted) +
tm_dots(size = 0.1, col = "red") +
tm_layout(title = "TZ Districts with Shifted Uvinza Centroid")latitude_shift <- 20000 # This was manually tested and shifted
longitude_shift <- -25000
# Get coordinates of Kilombero's centroid
kilombero_coords <- st_coordinates(tz_centroids_shifted$geometry[tz_centroids_shifted$district_name == "Kilombero"])
# Create new point with shifted coordinates
new_kilombero_point <- st_point(c(
kilombero_coords[1] + longitude_shift,
kilombero_coords[2] + latitude_shift
))
# Create new geometry collection
new_geometries2 <- tz_centroids_shifted$geometry
new_geometries2[tz_centroids_shifted$district_name == "Kilombero"] <- st_sfc(new_kilombero_point, crs = st_crs(tz_centroids_shifted))
# Create new centroids object with shifted kilombero point
tz_centroids_shifted2 <- tz_centroids_shifted
tz_centroids_shifted2$geometry <- new_geometries2
# Verify the shift
tmap_mode("view") # Change from "view" to "plot" to save on computingtmap mode set to interactive viewing
tm_shape(tz_districts_polygon_main) +
tm_polygons() +
tm_shape(tz_centroids_shifted2) +
tm_dots(size = 0.1, col = "red") +
tm_layout(title = "TZ Districts with Shifted Kilombero Centroid")rm(tz_centroids_shifted)The shift in the centroid for Uvinza and Kilombero is now correct based on human evaluation. Changing “view” to “plot” to save on computing power.
rm(latitude_shift,
longitude_shift,
uvinza_coords,
kilombero_coords,
new_uvinza_point,
new_kilombero_point,
new_geometries,
new_geometries2,
tz_centroids_main
) # Keep environment cleanLoad the Financial Inclusion Survey results
findata <- read_csv("data/aspatial/FinScope Tanzania 2023_Individual Main Data_FINAL.csv")Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 9915 Columns: 721
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (703): reg_name, dist_name, ward_code1, ward_name, ea_code, clustertype,...
dbl (13): SN, reg_code, dist_code, c8c, D6_1_1, D6_1_2, D6_1_3, gov_3, cmg4...
lgl (5): e_5_1, e_5_2, g_5_2__5, g_5_2__13, serv2_4
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
findata <- findata %>%
mutate(dist_name = case_when(
dist_name == "Kigamboni" ~ "Temeke",
dist_name == "Arumeru" ~ "Meru",
dist_name == "Ubungo" ~ "Kinondoni",
dist_name == "Kibiti" ~ "Rufiji",
dist_name == "Malinyi" ~ "Ulanga",# This change was made due to examination of district names for matching below.
TRUE ~ dist_name # keeps all other names unchanged
))# Check for district name misspellings
# Function to find best match for district names
find_best_match <- function(survey_name, shapefile_names) {
# Calculate string distances using various methods
distances <- stringdist(tolower(survey_name),
tolower(shapefile_names),
method = "jw") # Jaro-Winkler distance
# Find the best match (smallest distance)
best_match <- shapefile_names[which.min(distances)]
# Calculate the similarity score (1 - normalized distance)
best_score <- 1 - min(distances)
# Only return match if similarity is high enough
if (best_score >= 0.83) {
return(best_match)
} else {
return(NA_character_)
}
}
# Create matching dictionary
create_district_dictionary <- function(survey_districts, shapefile_districts) {
# Create mapping dataframe
mapping <- data.frame(
survey_name = unique(survey_districts),
stringsAsFactors = FALSE
) %>%
mutate(
# Find best match for each survey district name
shapefile_name = sapply(survey_name,
find_best_match,
shapefile_districts),
# Calculate similarity score
match_score = sapply(survey_name, function(x) {
distances <- stringdist(tolower(x),
tolower(shapefile_districts),
method = "jw")
1 - min(distances)
})
)
# Sort by match score to easily review matches
mapping <- mapping %>%
arrange(desc(match_score))
return(mapping)
}
# Apply the matching
mapping <- create_district_dictionary(
unique(findata$dist_name),
unique(tz_districts_polygon_main$district_name) # adjust column name as needed
)
# Review the matches
# Print matches with low confidence for manual review
low_confidence <- mapping %>%
filter(match_score < 0.83) %>%
arrange(match_score)
print("Low confidence matches that might need manual review:")[1] "Low confidence matches that might need manual review:"
print(low_confidence)[1] survey_name shapefile_name match_score
<0 rows> (or 0-length row.names)
# Create a function to apply the mapping
standardize_district_names <- function(data, mapping) {
data %>%
left_join(mapping %>%
select(survey_name, shapefile_name),
by = c("dist_name" = "survey_name")) %>%
mutate(dist_name_std = coalesce(shapefile_name, dist_name)) %>%
select(-shapefile_name)
}
# Apply standardization to your survey data
findata_standardized <- standardize_district_names(findata, mapping)
# Verify the matching worked
verification <- findata_standardized %>%
group_by(dist_name, dist_name_std) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(dist_name)
# Print summary of changes
cat("\nNumber of districts matched:", sum(!is.na(mapping$shapefile_name)))
Number of districts matched: 144
cat("\nNumber of districts unmatched:", sum(is.na(mapping$shapefile_name)))
Number of districts unmatched: 0
# Display some example matches
cat("\nExample matches (original -> standardized):\n")
Example matches (original -> standardized):
head(mapping %>% filter(!is.na(shapefile_name)), 10) %>%
mutate(mapping = paste(survey_name, "->", shapefile_name)) %>%
pull(mapping) %>%
cat(sep = "\n")Misungwi -> Misungwi
Missenyi -> Missenyi
Kyela -> Kyela
Kongwa -> Kongwa
Ilala -> Ilala
Iramba -> Iramba
Mbogwe -> Mbogwe
Handeni -> Handeni
Chato -> Chato
Sengerema -> Sengerema
rm(low_confidence,
mapping,
verification,
create_district_dictionary,
find_best_match,
standardize_district_names
) # Keep environment cleanPick the columns to use for regression
# Select specified columns from the findata dataset
findata_selected <- findata_standardized %>%
select(
# Location and cluster information
dist_name, # District name
clustertype, # Cluster type
# Demographic variables
c8c, # Age
c9, # Gender
c11, # Education status
c14, # Agricultural activity involvement
# Weights
Household_weight, # Household level weight
population_wt, # Population weight
# Derived financial inclusion indicators
MM, # Mobile money usage
BANKED, # Banking services usage
MFI, # Microfinance institution usage
PENSION, # Pension services usage
INSURANCE, # Insurance services usage
SACCO, # SACCO membership/usage
CAPITALM_FUND_MANAGERS, # Capital market/fund manager usage
FORM_INVESTMENTS, # Formal investments
CMG, # Community microfinance group membership
INFORMAL_MONEYLENDER, # Informal moneylender usage
SOCIAL_GROUPS # Social group membership
) %>%
# Clean column names for consistency
clean_names()
# Check the structure of selected data
glimpse(findata_selected)Rows: 9,915
Columns: 19
$ dist_name <chr> "Misungwi", "Missenyi", "Kyela", "Kongwa", "Ila…
$ clustertype <chr> "Rural", "Rural", "Urban", "Urban", "Urban", "U…
$ c8c <dbl> 47, 63, 74, 29, 53, 39, 24, 55, 45, 56, 51, 36,…
$ c9 <chr> "Female", "Female", "Male", "Female", "Male", "…
$ c11 <chr> "Some primary", "No formal education", "Some pr…
$ c14 <chr> "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", …
$ household_weight <dbl> 1381.5372, 2986.4383, 1434.8197, 2352.7250, 180…
$ population_wt <dbl> 3191.1104, 3675.4824, 2043.7091, 4003.1678, 261…
$ mm <chr> "MM", "Not MM", "MM", "MM", "MM", "MM", "MM", "…
$ banked <chr> "Not Banked", "Not Banked", "Not Banked", "Not …
$ mfi <chr> "Not MFI", "Not MFI", "Not MFI", "Not MFI", "No…
$ pension <chr> "Not PENSION", "Not PENSION", "Not PENSION", "N…
$ insurance <chr> "0", "0", "INSURANCE", "0", "0", "0", "0", "0",…
$ sacco <chr> "Not SACCO", "Not SACCO", "Not SACCO", "Not SAC…
$ capitalm_fund_managers <chr> "Not CAPITALM_FUND_MANAGERS", "Not CAPITALM_FUN…
$ form_investments <chr> "Not FORM_INVESTMENTS", "Not FORM_INVESTMENTS",…
$ cmg <chr> "CMG", "CMG", "CMG", "CMG", "Not CMG", "Not CMG…
$ informal_moneylender <chr> "Not INFORMAL_MONEYLENDER", "Not INFORMAL_MONEY…
$ social_groups <chr> "Not SOCIAL_GROUPS", "Not SOCIAL_GROUPS", "Not …
summary(findata_selected) dist_name clustertype c8c c9
Length:9915 Length:9915 Min. : 16.00 Length:9915
Class :character Class :character 1st Qu.: 27.00 Class :character
Mode :character Mode :character Median : 37.00 Mode :character
Mean : 39.68
3rd Qu.: 50.00
Max. :100.00
c11 c14 household_weight population_wt
Length:9915 Length:9915 Min. : 50.71 Min. : 73.48
Class :character Class :character 1st Qu.: 515.84 1st Qu.: 1174.56
Mode :character Mode :character Median : 1040.76 Median : 2287.63
Mean : 1427.35 Mean : 3442.69
3rd Qu.: 1745.86 3rd Qu.: 4175.85
Max. :11680.64 Max. :50600.52
mm banked mfi pension
Length:9915 Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
insurance sacco capitalm_fund_managers
Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
form_investments cmg informal_moneylender social_groups
Length:9915 Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
# Get summary statistics of the selected variables
skim(findata_selected)| Name | findata_selected |
| Number of rows | 9915 |
| Number of columns | 19 |
| _______________________ | |
| Column type frequency: | |
| character | 16 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| dist_name | 0 | 1 | 3 | 12 | 0 | 144 | 0 |
| clustertype | 0 | 1 | 5 | 5 | 0 | 2 | 0 |
| c9 | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| c11 | 0 | 1 | 10 | 41 | 0 | 10 | 0 |
| c14 | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| mm | 0 | 1 | 2 | 6 | 0 | 2 | 0 |
| banked | 0 | 1 | 6 | 10 | 0 | 2 | 0 |
| mfi | 0 | 1 | 3 | 7 | 0 | 2 | 0 |
| pension | 0 | 1 | 7 | 11 | 0 | 2 | 0 |
| insurance | 0 | 1 | 1 | 9 | 0 | 2 | 0 |
| sacco | 0 | 1 | 5 | 9 | 0 | 2 | 0 |
| capitalm_fund_managers | 0 | 1 | 22 | 26 | 0 | 2 | 0 |
| form_investments | 0 | 1 | 16 | 20 | 0 | 2 | 0 |
| cmg | 0 | 1 | 3 | 7 | 0 | 2 | 0 |
| informal_moneylender | 0 | 1 | 20 | 24 | 0 | 2 | 0 |
| social_groups | 0 | 1 | 13 | 17 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| c8c | 0 | 1 | 39.68 | 16.65 | 16.00 | 27.00 | 37.00 | 50.00 | 100.00 | ▇▆▃▂▁ |
| household_weight | 0 | 1 | 1427.35 | 1480.22 | 50.71 | 515.84 | 1040.76 | 1745.86 | 11680.64 | ▇▁▁▁▁ |
| population_wt | 0 | 1 | 3442.69 | 3977.97 | 73.48 | 1174.56 | 2287.63 | 4175.85 | 50600.52 | ▇▁▁▁▁ |
rm(findata,
findata_standardized
) # Keep environment cleanExploratory Data Analysis
# Summary
summary(findata_selected) dist_name clustertype c8c c9
Length:9915 Length:9915 Min. : 16.00 Length:9915
Class :character Class :character 1st Qu.: 27.00 Class :character
Mode :character Mode :character Median : 37.00 Mode :character
Mean : 39.68
3rd Qu.: 50.00
Max. :100.00
c11 c14 household_weight population_wt
Length:9915 Length:9915 Min. : 50.71 Min. : 73.48
Class :character Class :character 1st Qu.: 515.84 1st Qu.: 1174.56
Mode :character Mode :character Median : 1040.76 Median : 2287.63
Mean : 1427.35 Mean : 3442.69
3rd Qu.: 1745.86 3rd Qu.: 4175.85
Max. :11680.64 Max. :50600.52
mm banked mfi pension
Length:9915 Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
insurance sacco capitalm_fund_managers
Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
form_investments cmg informal_moneylender social_groups
Length:9915 Length:9915 Length:9915 Length:9915
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
# Create a series of visualizations
lapply(c("c9", "c11", "clustertype", "c14",
"mm", "banked", "mfi", "pension",
"insurance", "sacco","capitalm_fund_managers",
"form_investments","cmg",
"informal_moneylender","social_groups"), function(var) {
ggplot(findata_selected, aes_string(x = var)) +
geom_bar() +
theme_minimal() +
labs(title = paste("Distribution of", var),
x = var,
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
})Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

[[6]]

[[7]]

[[8]]

[[9]]

[[10]]

[[11]]

[[12]]

[[13]]

[[14]]

[[15]]

Looking at the character charts, we need to aggregate them into a variable by district level. The only non-binary variable is education. I’ll need to give it a education score.
# Create education score mapping
education_scores <- c(
"Don’t know" = 0,
"No formal education" = 1,
"Post primary technical training" = 4, # After primary but before secondary
"Primary completed" = 3,
"Secondary competed-O level" = 5,
"Secondary completed-A level" = 6,
"Some primary" = 2,
"Some secondary" = 4,
"Some University or other higher education" = 6,
"University or other higher education" = 7,
"University or higher education completed" = 8
)
# Apply scores to your data
findata_selected_ed <- findata_selected %>%
mutate(education_score = education_scores[c11])
rm(education_scores, findata_selected)Group and aggregate the survey data to form district-level data
district_summary <- findata_selected_ed %>%
group_by(dist_name) %>%
summarise(
# Urbanity - count and percentage of urban areas
urban_count = sum(clustertype == "Urban", na.rm = TRUE),
urban_pct = mean(clustertype == "Urban", na.rm = TRUE) * 100,
# Demographics
median_age = median(as.numeric(c8c), na.rm = TRUE),
average_ed = mean(as.numeric(education_score), na.rm = TRUE),
male_count = sum(c9 == "Male", na.rm = TRUE),
male_pct = mean(c9 == "Male", na.rm = TRUE) * 100,
agriculture_count = sum(c14 == "Yes", na.rm = TRUE),
agriculture_pct = mean(c14 == "Yes", na.rm = TRUE) * 100,
# Financial services counts and percentages
mobile_money_count = sum(mm == "MM", na.rm = TRUE),
mobile_money_pct = mean(mm == "MM", na.rm = TRUE) * 100,
bank_count = sum(banked == "Banked", na.rm = TRUE),
bank_pct = mean(banked == "Banked", na.rm = TRUE) * 100,
mfi_count = sum(mfi == "MFI", na.rm = TRUE),
mfi_pct = mean(mfi == "MFI", na.rm = TRUE) * 100,
pension_count = sum(pension == "PENSION", na.rm = TRUE),
pension_pct = mean(pension == "PENSION", na.rm = TRUE) * 100,
insurance_count = sum(insurance == "INSURANCE", na.rm = TRUE),
insurance_pct = mean(insurance == "INSURANCE", na.rm = TRUE) * 100,
sacco_count = sum(sacco == "SACCO", na.rm = TRUE),
sacco_pct = mean(sacco == "SACCO", na.rm = TRUE) * 100,
capital_count = sum(capitalm_fund_managers == "CAPITALM_FUND_MANAGERS", na.rm = TRUE),
capital_pct = mean(capitalm_fund_managers == "CAPITALM_FUND_MANAGERS", na.rm = TRUE) * 100,
invest_count = sum(form_investments == "FORM_INVESTMENTS", na.rm = TRUE),
invest_pct = mean(form_investments == "FORM_INVESTMENTS", na.rm = TRUE) * 100,
cmg_count = sum(cmg == "CMG", na.rm = TRUE),
cmg_pct = mean(cmg == "CMG", na.rm = TRUE) * 100,
moneylender_count = sum(informal_moneylender == "INFORMAL_MONEYLENDER", na.rm = TRUE),
moneylender_pct = mean(informal_moneylender == "INFORMAL_MONEYLENDER", na.rm = TRUE) * 100,
social_count = sum(social_groups == "SOCIAL_GROUPS", na.rm = TRUE),
social_pct = mean(social_groups == "SOCIAL_GROUPS", na.rm = TRUE) * 100,
# Total responses per district
total_respondents = n()
) %>%
# Round all percentage columns to 2 decimal places
mutate(across(ends_with("_pct"), ~round(., 2)))
# View the first few rows
head(district_summary)# A tibble: 6 × 32
dist_name urban_count urban_pct median_age average_ed male_count male_pct
<chr> <int> <dbl> <dbl> <dbl> <int> <dbl>
1 Arusha 75 100 35 4.07 29 38.7
2 Babati 15 14.3 34 3.23 43 41.0
3 Bagamoyo 29 39.7 38 3.41 34 46.6
4 Bahi 0 0 34 1.8 17 37.8
5 Bariadi 30 40 33 2.83 30 40
6 Biharamulo 0 0 30 2.49 22 48.9
# ℹ 25 more variables: agriculture_count <int>, agriculture_pct <dbl>,
# mobile_money_count <int>, mobile_money_pct <dbl>, bank_count <int>,
# bank_pct <dbl>, mfi_count <int>, mfi_pct <dbl>, pension_count <int>,
# pension_pct <dbl>, insurance_count <int>, insurance_pct <dbl>,
# sacco_count <int>, sacco_pct <dbl>, capital_count <int>, capital_pct <dbl>,
# invest_count <int>, invest_pct <dbl>, cmg_count <int>, cmg_pct <dbl>,
# moneylender_count <int>, moneylender_pct <dbl>, social_count <int>, …
# Check for any districts with suspicious values
summary(district_summary) dist_name urban_count urban_pct median_age
Length:144 Min. : 0.00 Min. : 0.00 Min. :29.00
Class :character 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:34.00
Mode :character Median : 15.00 Median : 23.07 Median :37.00
Mean : 23.03 Mean : 25.75 Mean :37.19
3rd Qu.: 30.00 3rd Qu.: 34.88 3rd Qu.:40.00
Max. :181.00 Max. :100.00 Max. :50.50
average_ed male_count male_pct agriculture_count
Min. :1.600 Min. : 4.00 Min. :26.67 Min. : 8.00
1st Qu.:2.578 1st Qu.:19.00 1st Qu.:39.91 1st Qu.: 29.00
Median :2.900 Median :29.00 Median :43.91 Median : 44.00
Mean :2.967 Mean :30.47 Mean :44.27 Mean : 48.92
3rd Qu.:3.278 3rd Qu.:40.00 3rd Qu.:48.89 3rd Qu.: 66.00
Max. :4.446 Max. :83.00 Max. :61.33 Max. :122.00
agriculture_pct mobile_money_count mobile_money_pct bank_count
Min. : 4.42 Min. : 10.00 Min. :30.00 Min. : 0.00
1st Qu.: 67.67 1st Qu.: 24.75 1st Qu.:60.00 1st Qu.: 5.00
Median : 83.89 Median : 42.00 Median :70.33 Median : 9.00
Mean : 76.27 Mean : 48.93 Mean :68.61 Mean :14.13
3rd Qu.: 92.03 3rd Qu.: 66.00 3rd Qu.:80.00 3rd Qu.:19.25
Max. :100.00 Max. :167.00 Max. :95.17 Max. :87.00
bank_pct mfi_count mfi_pct pension_count
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.:10.83 1st Qu.: 1.000 1st Qu.: 2.310 1st Qu.: 0.000
Median :17.78 Median : 3.000 Median : 4.785 Median : 1.000
Mean :18.51 Mean : 4.708 Mean : 6.068 Mean : 2.778
3rd Qu.:24.14 3rd Qu.: 6.000 3rd Qu.: 8.367 3rd Qu.: 4.000
Max. :48.07 Max. :34.000 Max. :22.670 Max. :18.000
pension_pct insurance_count insurance_pct sacco_count
Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. :0.0000
1st Qu.: 0.000 1st Qu.: 2.00 1st Qu.: 4.44 1st Qu.:0.0000
Median : 2.245 Median : 5.00 Median : 8.89 Median :0.0000
Mean : 3.548 Mean : 7.09 Mean : 9.42 Mean :0.8819
3rd Qu.: 6.670 3rd Qu.: 9.00 3rd Qu.:13.33 3rd Qu.:1.0000
Max. :17.330 Max. :47.00 Max. :30.99 Max. :9.0000
sacco_pct capital_count capital_pct invest_count
Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.750
Median : 0.000 Median :0.0000 Median :0.0000 Median : 1.000
Mean : 1.222 Mean :0.2708 Mean :0.3232 Mean : 2.924
3rd Qu.: 1.840 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.: 4.000
Max. :13.330 Max. :9.0000 Max. :6.6700 Max. :20.000
invest_pct cmg_count cmg_pct moneylender_count
Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 0.6225 1st Qu.: 3.000 1st Qu.: 6.67 1st Qu.: 1.000
Median : 2.6700 Median : 7.000 Median :10.00 Median : 2.000
Mean : 3.7360 Mean : 8.257 Mean :12.47 Mean : 2.944
3rd Qu.: 6.6700 3rd Qu.:12.000 3rd Qu.:17.33 3rd Qu.: 4.000
Max. :17.3300 Max. :32.000 Max. :40.91 Max. :20.000
moneylender_pct social_count social_pct total_respondents
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 15.00
1st Qu.: 1.330 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 45.00
Median : 2.870 Median : 1.000 Median : 2.070 Median : 60.00
Mean : 4.401 Mean : 2.076 Mean : 3.136 Mean : 68.85
3rd Qu.: 6.670 3rd Qu.: 3.000 3rd Qu.: 4.482 3rd Qu.: 90.00
Max. :20.000 Max. :13.000 Max. :22.220 Max. :181.00
Join aspatial data with geospatial data
# Join the spatial data
district_summary_spatial <- tz_districts_polygon_main %>%
left_join(district_summary,
by = c("district_name" = "dist_name")) %>%
st_as_sf() # ensure it remains as spatial object
# Join centroids
district_summary_spatial <- district_summary_spatial %>%
left_join(
tz_centroids_shifted2 %>%
st_set_geometry(NULL) %>% # remove geometry before joining
select(district_name, everything()),
by = c("district_name" = "district_name")
)
# Check the join results
print(paste("Number of districts in summary:", nrow(district_summary)))[1] "Number of districts in summary: 144"
print(paste("Number of districts after spatial join:", nrow(district_summary_spatial)))[1] "Number of districts after spatial join: 147"
print(paste("Number of districts with centroids:", sum(!is.na(district_summary_spatial$geometry))))[1] "Number of districts with centroids: 147"
# Check for any districts that didn't match
missing_districts <- district_summary_spatial %>%
filter(is.na(total_respondents)) %>%
pull(district_name)
if(length(missing_districts) > 0) {
print("Districts without survey data:")
print(missing_districts)
}[1] "Districts without survey data:"
[1] "Kaskazini a" "Kaskazini b" "Korogwe" "Mafia" "Mafinga"
[6] "Magharibi" "Makambako" "Tunduma"
district_summary_spatial <- district_summary_spatial %>%
drop_na(mobile_money_count)EDA to examine data
summary(district_summary_spatial) district_name geometry urban_count urban_pct
Length:139 POLYGON :139 Min. : 0.00 Min. : 0.00
Class :character epsg:32737 : 0 1st Qu.: 0.00 1st Qu.: 0.00
Mode :character +proj=utm ...: 0 Median : 15.00 Median : 22.41
Mean : 22.57 Mean : 25.40
3rd Qu.: 30.00 3rd Qu.: 34.48
Max. :181.00 Max. :100.00
median_age average_ed male_count male_pct
Min. :29.00 Min. :1.600 Min. : 4.00 Min. :26.67
1st Qu.:34.00 1st Qu.:2.578 1st Qu.:19.00 1st Qu.:39.83
Median :37.00 Median :2.884 Median :28.00 Median :44.00
Mean :37.26 Mean :2.950 Mean :30.15 Mean :44.34
3rd Qu.:40.00 3rd Qu.:3.234 3rd Qu.:40.00 3rd Qu.:48.89
Max. :50.50 Max. :4.446 Max. :83.00 Max. :61.33
agriculture_count agriculture_pct mobile_money_count mobile_money_pct
Min. : 8.00 Min. : 4.42 Min. : 10.0 Min. :30.00
1st Qu.: 29.50 1st Qu.: 69.41 1st Qu.: 24.0 1st Qu.:60.00
Median : 44.00 Median : 84.44 Median : 42.0 Median :70.00
Mean : 49.43 Mean : 77.60 Mean : 48.1 Mean :68.28
3rd Qu.: 66.00 3rd Qu.: 92.64 3rd Qu.: 65.5 3rd Qu.:79.91
Max. :122.00 Max. :100.00 Max. :167.0 Max. :95.17
bank_count bank_pct mfi_count mfi_pct
Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
1st Qu.: 4.50 1st Qu.:10.00 1st Qu.: 1.000 1st Qu.: 2.235
Median : 9.00 Median :17.78 Median : 3.000 Median : 5.000
Mean :13.91 Mean :18.41 Mean : 4.727 Mean : 6.122
3rd Qu.:18.50 3rd Qu.:24.18 3rd Qu.: 6.000 3rd Qu.: 8.525
Max. :87.00 Max. :48.07 Max. :34.000 Max. :22.670
pension_count pension_pct insurance_count insurance_pct
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 2.000 1st Qu.: 4.625
Median : 1.000 Median : 2.220 Median : 5.000 Median : 8.890
Mean : 2.669 Mean : 3.451 Mean : 7.094 Mean : 9.490
3rd Qu.: 4.000 3rd Qu.: 6.300 3rd Qu.: 9.000 3rd Qu.:13.330
Max. :18.000 Max. :17.330 Max. :47.000 Max. :30.990
sacco_count sacco_pct capital_count capital_pct
Min. :0.0000 Min. : 0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median : 0.000 Median :0.0000 Median :0.0000
Mean :0.8417 Mean : 1.194 Mean :0.2806 Mean :0.3348
3rd Qu.:1.0000 3rd Qu.: 1.745 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :9.0000 Max. :13.330 Max. :9.0000 Max. :6.6700
invest_count invest_pct cmg_count cmg_pct
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.00
1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 3.000 1st Qu.: 6.67
Median : 1.00 Median : 2.500 Median : 7.000 Median :10.00
Mean : 2.82 Mean : 3.645 Mean : 8.252 Mean :12.60
3rd Qu.: 4.00 3rd Qu.: 6.670 3rd Qu.:12.000 3rd Qu.:17.33
Max. :20.00 Max. :17.330 Max. :32.000 Max. :40.91
moneylender_count moneylender_pct social_count social_pct
Min. : 0 Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 1 1st Qu.: 1.340 1st Qu.: 0.00 1st Qu.: 0.000
Median : 2 Median : 3.330 Median : 1.00 Median : 2.220
Mean : 3 Mean : 4.507 Mean : 2.05 Mean : 3.133
3rd Qu.: 4 3rd Qu.: 6.670 3rd Qu.: 3.00 3rd Qu.: 4.485
Max. :20 Max. :20.000 Max. :13.00 Max. :22.220
total_respondents
Min. : 15.00
1st Qu.: 45.00
Median : 60.00
Mean : 67.99
3rd Qu.: 89.00
Max. :181.00
# Create histograms and boxplots for the main variables
plot_hist_box <- function(data, var, title) {
# Create histogram
p1 <- ggplot(data, aes(x = .data[[var]])) +
geom_histogram(fill = "skyblue", color = "black", alpha = 0.7) +
theme_minimal() +
labs(title = paste("Histogram of", title),
x = title,
y = "Count")
# Create boxplot
p2 <- ggplot(data, aes(y = .data[[var]])) +
geom_boxplot(fill = "skyblue", alpha = 0.7) +
theme_minimal() +
labs(title = paste("Boxplot of", title),
y = title)
# Arrange plots side by side
gridExtra::grid.arrange(p1, p2, ncol = 2)
}
# Create plots for key variables
variables_to_plot <- list(
c("urban_pct", "Urban Population %"),
c("median_age", "Median Age"),
c("average_ed", "Average Education Level"),
c("male_pct", "Male Population %"),
c("agriculture_pct", "Agricultural Employment %"),
c("mobile_money_pct", "Mobile Money Usage %"),
c("bank_pct", "Bank Account Usage %"),
c("mfi_pct", "Microfinance Institution Usage %"),
c("pension_pct", "Pension Scheme Usage %"),
c("insurance_pct", "Insurance Service Usage %"),
c("sacco_pct", "Savings & Credit Co-op Usage %"),
c("cmg_pct", "Credit Management Group Usage %"),
c("capital_pct", "Capital Market Fund Usage %"),
c("invest_pct", "Formal Investment Usage %"),
c("moneylender_pct", "Informal Moneylender Usage %")
)
# Generate all plots
for(var in variables_to_plot) {
plot_hist_box(district_summary_spatial, var[1], var[2])
}`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Create correlation plot for financial inclusion indicators
financial_vars <- c("mobile_money_pct", "bank_pct", "mfi_pct",
"pension_pct", "insurance_pct", "sacco_pct",
"cmg_pct","capital_pct","invest_pct","moneylender_pct")
correlation_data <- district_summary_spatial %>%
st_drop_geometry() %>%
select(all_of(financial_vars))
# Create correlation plot
corrplot::corrplot(cor(correlation_data, use = "complete.obs"),
method = "color",
type = "upper",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45,
diag = FALSE)
This correlation heatmap provides insights into the relationships between mobile money usage percentage (mobile_money_pct) and various financial service usage metrics across Tanzania. Focusing on mobile_money_pct, I observed that it has the strongest positive correlations with bank_pct (0.61) and mfi_pct (0.60), suggesting that higher usage of banking and microfinance services is associated with greater adoption of mobile money services. This could imply that regions with more formal financial services may also be more inclined to adopt mobile money, possibly due to higher financial literacy or a stronger financial infrastructure.
Other variables, such as pension_pct and insurance_pct, show moderate positive correlations with mobile_money_pct (0.40 and 0.25, respectively). This indicates a lesser, yet still positive, association with mobile money usage. Meanwhile, sacco_pct, cmg_pct, and moneylender_pct show weak positive correlations with mobile money usage, while invest_pct has a very weak negative correlation (-0.09), suggesting minimal or mixed associations.
Another interesting observation is the near-perfect correlation between pension_pct and invest_pct (0.98), indicating that these two variables may be closely linked or overlapping in certain areas; I will drop invest_pct. Overall, this correlation matrix suggests that formal financial service usage, particularly bank and microfinance services, has the most substantial positive association with mobile money adoption across Tanzania. This information could be useful for targeting strategies aimed at increasing mobile money usage, especially in regions with already established banking or microfinance services.
Regression
# Build the adaptive bandwidth GWR model
bw.adaptive <- bw.gwr(formula = mobile_money_count ~
urban_pct +
median_age +
average_ed +
male_pct +
agriculture_pct +
bank_pct +
mfi_pct +
pension_pct +
sacco_pct +
insurance_pct +
cmg_pct +
moneylender_pct,
data = district_summary_spatial,
approach = "CV",
kernel = "gaussian",
adaptive = TRUE,
longlat = FALSE)Adaptive bandwidth: 93 CV score: 75375.78
Adaptive bandwidth: 65 CV score: 76966.72
Adaptive bandwidth: 110 CV score: 74948.85
Adaptive bandwidth: 121 CV score: 74802.25
Adaptive bandwidth: 127 CV score: 74718.38
Adaptive bandwidth: 132 CV score: 74643.56
Adaptive bandwidth: 134 CV score: 74613.04
Adaptive bandwidth: 136 CV score: 74603.76
Adaptive bandwidth: 137 CV score: 74598.76
Adaptive bandwidth: 138 CV score: 74591.25
Adaptive bandwidth: 138 CV score: 74591.25
# Fit the GWR model using the optimal bandwidth
gwr.model <- gwr.basic(formula = mobile_money_count ~
urban_pct +
median_age +
average_ed +
male_pct +
agriculture_pct +
bank_pct +
mfi_pct +
pension_pct +
sacco_pct +
insurance_pct +
cmg_pct +
invest_pct +
moneylender_pct,
data = district_summary_spatial,
bw = bw.adaptive,
kernel = "gaussian",
adaptive = TRUE,
longlat = FALSE)
# Print model diagnostics
gwr.model ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-11-08 17:04:20.300623
Call:
gwr.basic(formula = mobile_money_count ~ urban_pct + median_age +
average_ed + male_pct + agriculture_pct + bank_pct + mfi_pct +
pension_pct + sacco_pct + insurance_pct + cmg_pct + invest_pct +
moneylender_pct, data = district_summary_spatial, bw = bw.adaptive,
kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: mobile_money_count
Independent variables: urban_pct median_age average_ed male_pct agriculture_pct bank_pct mfi_pct pension_pct sacco_pct insurance_pct cmg_pct invest_pct moneylender_pct
Number of data points: 139
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-54.641 -14.540 -1.805 13.866 62.114
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.53939 33.89187 -0.606 0.545596
urban_pct 0.44051 0.11419 3.858 0.000182 ***
median_age 0.13876 0.48621 0.285 0.775819
average_ed 16.99127 6.90880 2.459 0.015286 *
male_pct 0.07126 0.27156 0.262 0.793444
agriculture_pct -0.03117 0.15891 -0.196 0.844793
bank_pct 0.22653 0.26398 0.858 0.392477
mfi_pct 0.25138 0.51393 0.489 0.625611
pension_pct -1.35136 2.78282 -0.486 0.628096
sacco_pct -1.96007 1.00632 -1.948 0.053686 .
insurance_pct 0.43655 0.39172 1.114 0.267222
cmg_pct -0.31954 0.22752 -1.404 0.162662
invest_pct 1.15088 2.75064 0.418 0.676369
moneylender_pct -0.35728 0.46184 -0.774 0.440628
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.84 on 125 degrees of freedom
Multiple R-squared: 0.5346
Adjusted R-squared: 0.4862
F-statistic: 11.04 on 13 and 125 DF, p-value: 2.077e-15
***Extra Diagnostic information
Residual sum of squares: 59639.79
Sigma(hat): 20.86449
AIC: 1267.028
AICc: 1270.93
BIC: 1246.062
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Adaptive bandwidth: 138 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu. Max.
Intercept -32.4915092 -27.1804597 -22.4779079 -19.4541636 -12.8343
urban_pct 0.4225319 0.4240663 0.4317376 0.4469388 0.4555
median_age 0.0948055 0.1341729 0.1641688 0.2046965 0.2344
average_ed 15.4865555 16.6111845 17.0191379 17.5403920 18.2938
male_pct 0.0211997 0.0579996 0.0904163 0.1070567 0.1440
agriculture_pct -0.0516092 -0.0391837 -0.0255326 -0.0096544 -0.0022
bank_pct 0.1997025 0.2144771 0.2462946 0.2610998 0.2696
mfi_pct 0.2267741 0.2277642 0.2336139 0.2611214 0.2776
pension_pct -2.2040833 -2.0323281 -1.7827381 -1.3813795 -1.0678
sacco_pct -2.1128004 -2.0297480 -1.9866202 -1.8954683 -1.7415
insurance_pct 0.4099617 0.4231665 0.4583165 0.4843928 0.4918
cmg_pct -0.3644414 -0.3445227 -0.3113028 -0.2863834 -0.2784
invest_pct 0.9538858 1.2109123 1.5399441 1.7334652 1.8925
moneylender_pct -0.4629537 -0.4197574 -0.3463350 -0.2934797 -0.2468
************************Diagnostic information*************************
Number of data points: 139
Effective number of parameters (2trace(S) - trace(S'S)): 17.64289
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 121.3571
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1271.901
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1248.973
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1172.609
Residual sum of squares: 57958.69
R-square value: 0.5476804
Adjusted R-square value: 0.4813758
***********************************************************************
Program stops at: 2024-11-08 17:04:20.340749
# Extract local R2 values
local_r2 <- gwr.model$SDF$Local_R2
# Extract coefficient estimates
coef_estimates <- as.data.frame(gwr.model$SDF)
# Create summary statistics for local coefficients
coef_summary <- data.frame(
Variable = names(coef_estimates)[1:4], # Includes intercept
Mean = colMeans(coef_estimates[,1:4]),
Min = apply(coef_estimates[,1:4], 2, min),
Max = apply(coef_estimates[,1:4], 2, max),
SD = apply(coef_estimates[,1:4], 2, sd)
)
print(coef_summary) Variable Mean Min Max SD
Intercept Intercept -22.9949012 -32.49150921 -12.8343327 5.09815958
urban_pct urban_pct 0.4355998 0.42253188 0.4555461 0.01182212
median_age median_age 0.1675192 0.09480547 0.2344274 0.03915137
average_ed average_ed 17.0309513 15.48655552 18.2937911 0.71052515
summary(gwr.model$SDF$yhat) Min. 1st Qu. Median Mean 3rd Qu. Max.
12.33 31.84 45.53 48.04 55.63 122.84
# First, let's properly prepare the data
# Convert GWR results to appropriate format
gwr_results <- as.data.frame(gwr.model$SDF)
# Ensure the spatial data is properly formatted
district_gwr.sf.combined <- district_summary_spatial %>%
cbind(Local_R2 = gwr_results$Local_R2) %>%
st_as_sf()
# Check the structure
str(district_gwr.sf.combined$Local_R2) num [1:139] 0.545 0.545 0.552 0.549 0.54 ...
tmap_mode("plot")tmap mode set to plotting
# Create the map
tm_shape(district_gwr.sf.combined) +
tm_fill(col = "Local_R2",
style = "pretty",
palette = "viridis",
title = "Local R-squared Values") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "GWR Model Performance by District",
main.title.size = 1,
frame = FALSE) +
tm_compass(type = "arrow", position = c("right", "top")) +
tm_scale_bar(position = c("left", "bottom"))
# Create summary statistics of Local R2 values
summary_stats <- summary(district_gwr.sf.combined$Local_R2)
print(summary_stats) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5314 0.5409 0.5473 0.5468 0.5525 0.5601
In examining the Geographically Weighted Regression (GWR) model performance across districts in Tanzania, I observed that the model’s effectiveness varies significantly by region. The map uses local R-squared values, ranging from 0.530 to 0.565, to indicate how well the model explains the variability in mobile money usage within each district. Higher R-squared values, seen in the southern and south-eastern regions (in green and yellow), suggest a better model fit, meaning the model explains more of the variance in these areas. In contrast, the northern regions (in dark blue/purple) show lower R-squared values, indicating that the model doesn’t perform as well there. This variation in R-squared values across Tanzania suggests that some of the predictors I’m using, such as education or urbanisation, may be more relevant in certain areas but less effective in others. It could also mean that there are other local factors influencing mobile money adoption in the northern regions that the current model doesn’t capture.
Analysis of the regression results
# Create list of all variables we can analyze
gwr_variables <- c(
"urban_pct" = "Urban Population %",
"median_age" = "Median Age",
"average_ed" = "Average Education Level",
"male_pct" = "Male Population %",
"agriculture_pct" = "Agricultural Employment %",
"bank_pct" = "Bank Usage %",
"mfi_pct" = "Microfinance Institution Usage %",
"pension_pct" = "Pension Usage %",
"insurance_pct" = "Insurance Usage %",
"sacco_pct" = "SACCO Usage %",
"cmg_pct" = "Credit Management Group Usage %",
"invest_pct" = "Formal Investment Usage %",
"moneylender_pct" = "Informal Moneylender Usage %"
)
plot_gwr_stats <- function(variable_name, model = gwr.model, spatial_data = district_summary_spatial) {
# Extract variable statistics
stats <- data.frame(
district_name = spatial_data$district_name,
coefficient = as.numeric(unlist(model$SDF[[paste0(variable_name)]])),
t_value = as.numeric(unlist(model$SDF[[paste0(variable_name, "_TV")]])),
se_value = as.numeric(unlist(model$SDF[[paste0(variable_name, "_SE")]])))
# Calculate p-values
stats$p_value <- 2 * pt(abs(stats$t_value),
df = 121,
lower.tail = FALSE)
# Join with spatial data
analysis_sf <- spatial_data %>%
left_join(stats, by = "district_name") %>%
st_as_sf()
# Set tmap mode to plot
tmap_mode("plot")
# Create coefficient map
coef_map <- tm_shape(analysis_sf) +
tm_fill(col = "coefficient",
style = "quantile",
n = 5,
palette = "RdBu",
midpoint = 0,
title = "Coefficient Values") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = paste(gwr_variables[variable_name], "\nCoefficients"),
main.title.size = 0.8,
legend.title.size = 0.7,
legend.text.size = 0.6,
frame = FALSE)
# Create p-value map
p_map <- tm_shape(analysis_sf) +
tm_fill(col = "p_value",
style = "fixed",
breaks = c(0, 0.01, 0.05, 0.1, 1),
palette = "viridis",
title = "P-values") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = paste(gwr_variables[variable_name], "\nP-values"),
main.title.size = 0.8,
legend.title.size = 0.7,
legend.text.size = 0.6,
frame = FALSE)
# Arrange maps side by side
combined_maps <- tmap_arrange(coef_map, p_map, ncol = 2)
# Print summaries
cat("\nSummary Statistics for", gwr_variables[variable_name], "\n")
cat("\nCoefficients:\n")
print(summary(stats$coefficient))
cat("\nStandard Errors:\n")
print(summary(stats$se_value))
cat("\nP-values:\n")
print(summary(stats$p_value))
# Create significance summary with mean coefficients
significance_counts <- data.frame(
Significance = c("Highly significant (p < 0.01)",
"Significant (0.01 ≤ p < 0.05)",
"Marginally significant (0.05 ≤ p < 0.1)",
"Not significant (p ≥ 0.1)"),
Count = c(
sum(stats$p_value < 0.01),
sum(stats$p_value >= 0.01 & stats$p_value < 0.05),
sum(stats$p_value >= 0.05 & stats$p_value < 0.1),
sum(stats$p_value >= 0.1)
),
Percentage = c(
mean(stats$p_value < 0.01),
mean(stats$p_value >= 0.01 & stats$p_value < 0.05),
mean(stats$p_value >= 0.05 & stats$p_value < 0.1),
mean(stats$p_value >= 0.1)
) * 100,
Mean_Coefficient = c(
mean(stats$coefficient[stats$p_value < 0.01]),
mean(stats$coefficient[stats$p_value >= 0.01 & stats$p_value < 0.05]),
mean(stats$coefficient[stats$p_value >= 0.05 & stats$p_value < 0.1]),
mean(stats$coefficient[stats$p_value >= 0.1])
)
)
cat("\nSignificance Summary:\n")
print(significance_counts)
# Print additional interpretation
cat("\nInterpretation:\n")
cat("Coefficient range:", round(min(stats$coefficient), 3), "to", round(max(stats$coefficient), 3), "\n")
cat("Mean coefficient:", round(mean(stats$coefficient), 3), "\n")
cat("Percentage of significant coefficients (p < 0.05):",
round(mean(stats$p_value < 0.05) * 100, 1), "%\n")
# Return maps and data
return(list(maps = combined_maps))
}Examining how urbanity affects use of mobile money
plot_gwr_stats("urban_pct")tmap mode set to plotting
Summary Statistics for Urban Population %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4225 0.4241 0.4317 0.4356 0.4469 0.4555
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1147 0.1149 0.1150 0.1150 0.1151 0.1152
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0001279 0.0001676 0.0002715 0.0002520 0.0003376 0.0003546
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 139 100 0.4355998
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 0 0 NaN
Interpretation:
Coefficient range: 0.423 to 0.456
Mean coefficient: 0.436
Percentage of significant coefficients (p < 0.05): 100 %
$maps

The relationship between urban population percentage and mobile money usage in Tanzania shows distinct geographical patterns. The coefficient map on the left indicates that values range from 0.423 to 0.456, with the darkest regions in the south and southeast showing the strongest positive impact of urban population percentage on mobile money usage. In contrast, the lighter northern regions have lower coefficients, suggesting a weaker association between urbanisation and mobile money adoption. The p-values map on the right confirms that this relationship is statistically significant across nearly all regions, with most areas displaying very low p-values (0.00 to 0.01). This consistent significance indicates that, despite regional differences, urban population percentage remains a key predictor of mobile money usage throughout Tanzania.
A closer examination of geographical features provides insight into these spatial patterns. The Southern Highlands—notably the agricultural districts of Mbeya, Iringa, and Njombe—depend heavily on farming, with predominantly rural populations. In these areas, the positive relationship between urban population percentage and mobile money usage likely reflects the limited financial infrastructure in rural zones. As urban centres develop in these highland regions, they become focal points for financial services, promoting mobile money adoption among rural residents seeking alternatives to traditional banking.
Similarly, in the Lake Victoria basin in the northwest—covering regions like Mwanza, Mara, and Kagera—the economy is largely agricultural. Here, the association between urbanisation and mobile money usage is weaker, as rural populations often rely on informal financial systems and may have limited exposure to mobile financial services. This reliance on agriculture and high rural population density results in lower coefficients, reflecting limited influence of urbanisation on mobile money adoption in these areas.
Conversely, coastal regions such as Dar es Salaam and Zanzibar demonstrate a stronger positive relationship. As major economic and trade hubs, these coastal areas are highly urbanised and equipped with well-developed infrastructure for financial services and mobile connectivity. Urbanisation here enhances accessibility to mobile money platforms, with residents and businesses readily adopting mobile financial services. This dense service network along the coast contributes to higher coefficients in the south and southeast, where urbanisation plays a substantial role in expanding financial inclusion.
In summary, while urban population percentage is a significant predictor of mobile money usage across Tanzania, the impact varies by region due to specific geographical and economic characteristics. Agricultural areas in the Southern Highlands and Lake Victoria basin, dominated by rural populations, exhibit weaker correlations, likely due to limited financial infrastructure and reliance on traditional financial practices. In contrast, coastal economic hubs like Dar es Salaam and Zanzibar show a stronger positive association between urbanisation and mobile money usage. These findings highlight the importance of region-specific strategies for promoting financial inclusion, with a focus on improving infrastructure in agricultural regions and leveraging established urbanisation in tourism and coastal areas to maximise mobile money adoption.
Examining how age affects use of mobile money
plot_gwr_stats("median_age")tmap mode set to plotting
Summary Statistics for Median Age
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.09481 0.13417 0.16417 0.16752 0.20470 0.23443
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4882 0.4891 0.4896 0.4895 0.4898 0.4912
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6338 0.6767 0.7380 0.7336 0.7844 0.8471
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 0.1675192
Interpretation:
Coefficient range: 0.095 to 0.234
Mean coefficient: 0.168
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In examining the relationship between median age and mobile money usage across Tanzania, I noticed that the coefficients and p-values highlight some clear regional patterns. The coefficient map on the left shows values ranging from 0.095 to 0.234, with darker areas, particularly in the central and northeastern regions, showing higher coefficients. This suggests that median age has a stronger positive impact on mobile money usage in these areas, meaning that as the median age increases, mobile money usage is expected to increase more substantially. In contrast, the lighter areas, mainly in the western and southern regions, have lower coefficients, indicating a weaker relationship between median age and mobile money usage there. The p-values map on the right, however, shows that almost all regions have relatively high p-values (between 0.10 and 1.00), represented in yellow, suggesting that the relationship between median age and mobile money usage is not statistically significant across most of Tanzania. This indicates that while there might be some association between age and mobile money usage, it is generally weak and unreliable in this dataset.
Examining how education affects use of mobile money
plot_gwr_stats("average_ed")tmap mode set to plotting
Summary Statistics for Average Education Level
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.49 16.61 17.02 17.03 17.54 18.29
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.938 6.948 6.958 6.957 6.964 6.984
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.009791 0.012979 0.015665 0.016315 0.018415 0.028370
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 3 2.158273 18.27427
2 Significant (0.01 ≤ p < 0.05) 136 97.841727 17.00353
3 Marginally significant (0.05 ≤ p < 0.1) 0 0.000000 NaN
4 Not significant (p ≥ 0.1) 0 0.000000 NaN
Interpretation:
Coefficient range: 15.487 to 18.294
Mean coefficient: 17.031
Percentage of significant coefficients (p < 0.05): 100 %
$maps

In examining the relationship between average education level and mobile money usage across Tanzania, notable geographic patterns emerge in both the coefficient values and the p-values. The coefficient map on the left shows a range from 15.49 to 18.29, with darker regions in the north and northeast displaying higher coefficients. This suggests that in these areas, average education level has a stronger positive effect on mobile money usage; as education levels increase, mobile money adoption is expected to rise more significantly. These northern regions, including Arusha, Kilimanjaro, and parts of Tanga, have relatively high education levels compared to other areas. This stronger correlation may be influenced by the presence of educational institutions and greater economic opportunities in these regions, which create a favourable environment for the adoption of mobile financial services. Additionally, these regions are close to major tourist areas like the Serengeti and Ngorongoro, where the influence of tourism and a higher influx of educated individuals may further contribute to this positive effect.
In contrast, the lighter-shaded regions in the south and west, including areas like Rukwa, Katavi, and parts of the Southern Highlands, show lower coefficients, indicating a weaker relationship between education and mobile money usage. These areas are generally more rural and agricultural, with lower average education levels and limited access to financial infrastructure. In these regions, even where education levels increase, the impact on mobile money adoption appears less substantial, possibly due to economic activities that are less reliant on formal financial systems or due to limited exposure to mobile financial services.
The p-values map on the right further supports these findings, as most regions display low p-values (between 0.00 and 0.05), except for a small area in the far northeast. This indicates that the relationship between education and mobile money usage is statistically significant in the majority of Tanzania. The significance across regions underscores the role of education as a consistent predictor of mobile money usage, though its strength varies depending on regional characteristics.
Regional Analysis and Implications
In the northern and northeastern regions, where education appears to have a stronger impact, the connection between higher education levels and mobile money usage may reflect a more developed financial ecosystem and greater economic diversity. Urban centres in this area, such as Arusha and Moshi (near Kilimanjaro), likely provide residents with more access to financial services, fostering a positive environment for mobile money adoption among educated populations. This suggests that policies encouraging education in these areas could further boost financial inclusion, as residents are already predisposed to adopt mobile financial services.
In contrast, the southern and western regions are more agrarian, with economic activities focused on farming and limited urbanisation. Here, the lower coefficients suggest that increasing education alone may not be sufficient to significantly boost mobile money usage without concurrent investments in infrastructure and financial access. Regions like the Southern Highlands (Mbeya, Rukwa) and the Lake Tanganyika area are marked by lower population densities and limited telecommunications infrastructure, which may explain why the relationship between education and mobile money adoption is weaker. Efforts to improve financial inclusion in these areas may need to focus not only on education but also on expanding infrastructure and financial literacy programs tailored to rural communities.
In summary, this analysis highlights that while average education level is a meaningful predictor of mobile money usage across Tanzania, the strength of its impact is geographically variable. Regions in the north and northeast, which are more economically diverse and urbanised, show a stronger positive relationship, reflecting how education can enhance financial inclusion when combined with accessible financial services and infrastructure. Meanwhile, the weaker association in the rural south and west suggests that improving mobile money adoption in these areas will require multi-faceted strategies that go beyond education to address underlying infrastructure and economic challenges. Recognising these regional differences allows for targeted interventions that align with the specific socio-economic and geographical needs of each area, ultimately advancing financial inclusion in a way that reflects Tanzania’s diverse landscape.
Examining how gender affects use of mobile money
plot_gwr_stats("male_pct")tmap mode set to plotting
Summary Statistics for Male Population %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02120 0.05800 0.09042 0.08424 0.10706 0.14399
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2727 0.2731 0.2732 0.2733 0.2734 0.2746
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6009 0.6960 0.7413 0.7600 0.8320 0.9385
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 0.08424209
Interpretation:
Coefficient range: 0.021 to 0.144
Mean coefficient: 0.084
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In exploring the influence of gender on mobile money usage across Tanzania, I see that the relationship between gender percentage and mobile money usage is not statistically significant across most of Tanzania. This lack of significance means that while gender might seem to have a positive impact, this relationship is not strong enough to be considered reliable in this dataset.
Examining how agriculture activity affects use of mobile money
plot_gwr_stats("agriculture_pct")tmap mode set to plotting
Summary Statistics for Agricultural Employment %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.051609 -0.039184 -0.025533 -0.024973 -0.009654 -0.002237
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1595 0.1599 0.1600 0.1600 0.1601 0.1606
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7476 0.8068 0.8733 0.8768 0.9520 0.9889
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 -0.02497346
Interpretation:
Coefficient range: -0.052 to -0.002
Mean coefficient: -0.025
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In exploring the influence of agricultural employment percentage on mobile money usage across Tanzania, I see that the relationship between agricultural employment percentage and mobile money usage is not statistically significant across most of Tanzania. This lack of significance means that while agricultural employment might seem to have a negative impact, this relationship is not strong enough to be considered reliable in this dataset.
Examining how banking use affects use of mobile money
plot_gwr_stats("bank_pct")tmap mode set to plotting
Summary Statistics for Bank Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1997 0.2145 0.2463 0.2397 0.2611 0.2696
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2649 0.2655 0.2666 0.2663 0.2672 0.2674
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3146 0.3306 0.3559 0.3719 0.4206 0.4536
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 0.2396519
Interpretation:
Coefficient range: 0.2 to 0.27
Mean coefficient: 0.24
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show banking use effect is statistically insignificant.
Examining how usage of microfinance institution affects use of mobile money
plot_gwr_stats("mfi_pct")tmap mode set to plotting
Summary Statistics for Microfinance Institution Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2268 0.2278 0.2336 0.2439 0.2611 0.2776
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5162 0.5170 0.5174 0.5175 0.5181 0.5196
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5926 0.6146 0.6523 0.6385 0.6604 0.6619
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 0.243943
Interpretation:
Coefficient range: 0.227 to 0.278
Mean coefficient: 0.244
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show MFI usage effect is statistically insignificant.
Examining how pension usage affects use of mobile money
plot_gwr_stats("pension_pct")tmap mode set to plotting
Summary Statistics for Pension Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.204 -2.032 -1.783 -1.708 -1.381 -1.068
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.788 2.792 2.814 2.806 2.817 2.820
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4358 0.4719 0.5264 0.5474 0.6214 0.7026
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 -1.707537
Interpretation:
Coefficient range: -2.204 to -1.068
Mean coefficient: -1.708
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show pension usage effect is statistically insignificant.
Examining how usage of insurance affects use of mobile money
plot_gwr_stats("insurance_pct")tmap mode set to plotting
Summary Statistics for Insurance Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4100 0.4232 0.4583 0.4552 0.4844 0.4918
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3936 0.3938 0.3941 0.3943 0.3947 0.3959
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2143 0.2214 0.2486 0.2519 0.2848 0.3001
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 0.4551736
Interpretation:
Coefficient range: 0.41 to 0.492
Mean coefficient: 0.455
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show insurance effect is statistically insignificant.
Examining how SACCO usage affects use of mobile money
plot_gwr_stats("sacco_pct")tmap mode set to plotting
Summary Statistics for SACCO Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.113 -2.030 -1.987 -1.961 -1.895 -1.741
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.009 1.010 1.011 1.011 1.012 1.017
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.03961 0.04690 0.05154 0.05590 0.06304 0.08932
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0.00000 NaN
2 Significant (0.01 ≤ p < 0.05) 61 43.88489 -2.043493
3 Marginally significant (0.05 ≤ p < 0.1) 78 56.11511 -1.897046
4 Not significant (p ≥ 0.1) 0 0.00000 NaN
Interpretation:
Coefficient range: -2.113 to -1.741
Mean coefficient: -1.961
Percentage of significant coefficients (p < 0.05): 43.9 %
$maps

In examining the relationship between SACCO (Savings and Credit Cooperative Organization) usage percentage and mobile money usage across Tanzania, distinct geographical patterns emerge in both the coefficient and p-value maps. The coefficient map on the left shows values ranging from -2.113 to -1.741, all negative, with the darkest shades in the northern regions indicating the strongest negative coefficients. This suggests that in these areas, an increase in SACCO usage percentage is associated with a significant decrease in mobile money usage, potentially reflecting a preference for SACCOs over mobile financial services. Regions like Kilimanjaro, Arusha, and parts of Mara in the north may have deeply rooted SACCO networks that serve as primary financial institutions, reducing the need for mobile money options. SACCOs in these areas likely provide accessible and trusted financial services, and residents may view them as a stable alternative to newer mobile financial solutions, particularly where SACCOs have historically established strong ties within communities.
In contrast, the lighter regions in the southern areas of Tanzania, including districts in Mbeya and Ruvuma, display smaller negative coefficients, indicating a weaker inverse relationship between SACCO usage and mobile money adoption. These southern regions, though they may have SACCOs, do not exhibit the same level of negative association, possibly because mobile money services are more widely accepted or integrated with SACCOs, or because SACCO presence is less dominant. This could reflect a more flexible financial ecosystem in the south, where mobile money services and SACCOs coexist without significant competition for users.
The p-value map on the right shows that most regions have low p-values, with values between 0.00 and 0.10, represented by dark blue and green shading, indicating that the relationship between SACCO usage and mobile money usage is statistically significant across much of Tanzania. The consistent significance of this relationship, combined with the negative coefficients, suggests that SACCO usage plays a substantial role in influencing mobile money adoption. In regions with a strong SACCO presence, these cooperatives may fulfil financial needs that mobile money platforms otherwise would, thereby limiting mobile money’s role.
Regional Implications and Analysis
In the northern east regions, where SACCO usage has a stronger negative impact on mobile money adoption, the preference for SACCOs may be shaped by socio-cultural factors and the structure of the local economy. These regions often have stronger communal and cooperative financial practices, where SACCOs are community-driven and cater to collective needs, making them highly trusted institutions. Additionally, SACCOs offer specific financial services, such as loans and savings, that might be seen as more comprehensive compared to mobile money, which primarily focuses on payments and transfers. In these areas, policies to promote mobile money may need to address this preference by exploring potential partnerships between mobile money providers and SACCOs or by expanding the financial services available through mobile platforms to make them more competitive.
In the southern regions, where the inverse relationship is weaker, mobile money adoption might coexist more readily with SACCO services, suggesting that SACCOs do not dominate the financial landscape to the same extent as in the north. This could be due to a more diversified financial ecosystem, where users feel comfortable accessing both SACCO services and mobile money options. Here, policies to promote mobile money may be more straightforward, focusing on improving accessibility and enhancing user awareness without facing the same level of competition from SACCOs.
Overall, these findings highlight that SACCO usage acts as a significant factor in limiting mobile money adoption in Tanzania, with stronger effects in the north where SACCOs are deeply embedded in the financial culture. The need for tailored approaches becomes clear: in SACCO-dominant regions, efforts to increase mobile money adoption might focus on building alliances with SACCOs or providing similar financial services through mobile platforms. In regions where the SACCO influence is weaker, mobile money services could expand through straightforward strategies like awareness campaigns and infrastructure investments. Recognising these regional differences allows for interventions that respect local financial preferences while still advancing the broader goal of financial inclusion across Tanzania.Examining how Credit Management Group (CMG) usage affects use of mobile money
plot_gwr_stats("cmg_pct")tmap mode set to plotting
Summary Statistics for Credit Management Group Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3644 -0.3445 -0.3113 -0.3161 -0.2864 -0.2784
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2283 0.2289 0.2294 0.2293 0.2298 0.2303
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1157 0.1365 0.1760 0.1742 0.2133 0.2267
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 -0.3160863
Interpretation:
Coefficient range: -0.364 to -0.278
Mean coefficient: -0.316
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show Credit Management Group (CMG) usage is statistically insignificant.
Examining how formal investment usage affects use of mobile money
plot_gwr_stats("invest_pct")tmap mode set to plotting
Summary Statistics for Formal Investment Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.9539 1.2109 1.5399 1.4734 1.7335 1.8925
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.756 2.760 2.782 2.774 2.785 2.788
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4983 0.5350 0.5786 0.5985 0.6613 0.7300
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 1.47335
Interpretation:
Coefficient range: 0.954 to 1.893
Mean coefficient: 1.473
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show formal investment is statistically insignificant.
Examining how informal moneylender usage affects use of mobile money
plot_gwr_stats("moneylender_pct")tmap mode set to plotting
Summary Statistics for Informal Moneylender Usage %
Coefficients:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.4630 -0.4198 -0.3463 -0.3547 -0.2935 -0.2468
Standard Errors:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4636 0.4644 0.4646 0.4649 0.4655 0.4669
P-values:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3224 0.3691 0.4575 0.4517 0.5285 0.5976
Significance Summary:
Significance Count Percentage Mean_Coefficient
1 Highly significant (p < 0.01) 0 0 NaN
2 Significant (0.01 ≤ p < 0.05) 0 0 NaN
3 Marginally significant (0.05 ≤ p < 0.1) 0 0 NaN
4 Not significant (p ≥ 0.1) 139 100 -0.3547404
Interpretation:
Coefficient range: -0.463 to -0.247
Mean coefficient: -0.355
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show use of informal money lenders is statistically insignificant.
Concluding points
Building upon the initial analysis of the geography of financial inclusion in Tanzania, the Geographically Weighted Regression (GWR) model provides nuanced district-level insights into the factors influencing access to financial services. The GWR results reveal significant spatial variability, with local R-squared values ranging from 0.530 to 0.565. This variation indicates that the model’s explanatory power differs across regions, being more effective in the southern and southeastern districts compared to the northern areas. Such discrepancies suggest that predictors like education and urbanization have varying levels of influence in different geographic contexts, emphasizing the importance of spatial analysis in understanding financial inclusion.
Geospatial analysis of the coefficients and significance levels highlights that urban population percentage and average education level are significant positive predictors of mobile money usage in most regions. However, their impact is not uniform across the country. The southern and southeastern regions, where the model fits better, show stronger positive relationships. This suggests that urbanization and education have a more pronounced effect on mobile money adoption in these areas, possibly due to better infrastructure and higher concentrations of services. In contrast, the northern regions exhibit lower coefficients and less statistical significance, indicating that other local factors might be influencing mobile money usage there.
Conversely, the significant negative relationship between SACCO (Savings and Credit Cooperative Organization) usage percentage and mobile money usage is also spatially variable. The negative impact is more pronounced and statistically significant in certain northern districts, suggesting that traditional financial institutions like SACCOs may be more deeply rooted in these areas. This could imply competition between SACCOs and mobile money platforms, affecting the adoption rates of the latter. The geospatial distribution of this relationship highlights the need to consider local financial ecosystems when promoting mobile money services.
The geospatial patterns observed in the GWR analysis underscore the importance of considering spatial heterogeneity when addressing financial inclusion. The varying influence of different predictors across regions indicates that a one-size-fits-all approach may not be effective. Region-specific strategies are necessary to address the unique challenges and leverage the strengths of each area. For instance, enhancing urban infrastructure and educational opportunities in regions where these factors significantly boost mobile money usage could be prioritized. In areas where SACCO usage negatively impacts mobile money adoption, integrating mobile money services with existing SACCO operations or promoting awareness of the benefits of mobile money could mitigate this effect.
Understanding Tanzania’s diverse geographical landscape—including its complex physical features, economic zones, and varying levels of infrastructure development—is crucial for interpreting the spatial patterns observed in the GWR analysis. The disparities in model performance and predictor significance are deeply intertwined with the country’s varied economic activities, population distribution, and accessibility to services. For instance, the agricultural regions in the Southern Highlands and Lake Victoria basin, where reliance on farming and a predominantly rural population prevail, may exhibit lower mobile money usage due to limited financial infrastructure and lower education levels. Conversely, areas rich in tourism—such as the Northern Circuit with the Serengeti and Mount Kilimanjaro, and coastal regions like Dar es Salaam and Zanzibar—benefit from better infrastructure, higher economic activity, and greater technological adoption, leading to increased mobile money usage. Infrastructure development varies significantly, with urban centers enjoying advanced facilities that promote financial inclusion, while rural and remote areas lag due to challenging terrains and sparse populations. These factors influence the spatial dynamics of mobile money adoption, as reflected in the varying significance of predictors like urbanization and education across different regions. Recognizing these geospatial nuances allows for a more accurate analysis that respects the local context, emphasizing the need for region-specific strategies—such as tailored financial services for agricultural communities or leveraging the existing infrastructure in tourism hubs—rather than imposing assumptions based on experiences from more urbanized countries like Singapore.
In conclusion, the GWR analysis not only identifies urbanization and education as key drivers of mobile money usage but also highlights how their effects vary geographically. The spatial variability in both model performance and predictor influence emphasizes the need for geospatially informed policymaking. Tailoring interventions to the specific needs and characteristics of each region can enhance the effectiveness of efforts to promote financial inclusion. By leveraging geospatial analysis, policymakers and stakeholders can develop targeted strategies that address the unique spatial dynamics influencing mobile money adoption across Tanzania. This geospatial approach is essential for overcoming geographic barriers, supporting underserved regions, and ultimately contributing to the country’s socio-economic development goals.